33 research outputs found

    Scalable and Compact 3D Action Recognition with Approximated RBF Kernel Machines

    Get PDF
    Despite the recent deep learning (DL) revolution, kernel machines still remain powerful methods for action recognition. DL has brought the use of large datasets and this is typically a problem for kernel approaches, which are not scaling up eciently due to kernel Gram matrices. Nevertheless, kernel methods are still attractive and more generally applicable since they can equally manage dierent sizes of the datasets, also in cases where DL techniques show some limitations. This work investigates these issues by proposing an explicit ap- proximated representation that, together with a linear model, is an equivalent, yet scalable, implementation of a kernel machine. Our approximation is directly inspired by the exact feature map that is induced by an RBF Gaussian kernel but, unlike the latter, it is nite dimensional and very compact. We justify the soundness of our idea with a theoretical analysis which proves the unbiasedness of the approximation, and provides a vanishing bound for its variance, which is shown to decrease much rapidly than in alternative methods in the literature. In a broad experimental validation, we assess the superiority of our approximation in terms of 1) ease and speed of training, 2) compactness of the model, and 3) improvements with respect to the state-of-the-art performance

    Intra-Camera Supervised Person Re-Identification

    Get PDF
    Existing person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we call Intra-Camera Supervised (ICS) person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors

    Bio-inspired relevant interaction modelling in cognitive crowd management

    Get PDF
    Cognitive algorithms, integrated in intelligent systems, represent an important innovation in designing interactive smart environments. More in details, Cognitive Systems have important applications in anomaly detection and management in advanced video surveillance. These algorithms mainly address the problem of modelling interactions and behaviours among the main entities in a scene. A bio-inspired structure is here proposed, which is able to encode and synthesize signals, not only for the description of single entities behaviours, but also for modelling cause–effect relationships between user actions and changes in environment configurations. Such models are stored within a memory (Autobiographical Memory) during a learning phase. Here the system operates an effective knowledge transfer from a human operator towards an automatic systems called Cognitive Surveillance Node (CSN), which is part of a complex cognitive JDL-based and bio-inspired architecture. After such a knowledge-transfer phase, learned representations can be used, at different levels, either to support human decisions, by detecting anomalous interaction models and thus compensating for human shortcomings, or, in an automatic decision scenario, to identify anomalous patterns and choose the best strategy to preserve stability of the entire system. Results are presented in a video surveillance scenario , where the CSN can observe two interacting entities consisting in a simulated crowd and a human operator. These can interact within a visual 3D simulator, where crowd behaviour is modelled by means of Social Forces. The way anomalies are detected and consequently handled is demonstrated, on synthetic and also on real video sequences, in both the user-support and automatic modes

    Expression of the Stress Response Oncoprotein LEDGF/p75 in Human Cancer: A Study of 21 Tumor Types

    Get PDF
    Oxidative stress-modulated signaling pathways have been implicated in carcinogenesis and therapy resistance. The lens epithelium derived growth factor p75 (LEDGF/p75) is a transcription co-activator that promotes resistance to stress-induced cell death. This protein has been implicated in inflammatory and autoimmune conditions, HIV-AIDS, and cancer. Although LEDGF/p75 is emerging as a stress survival oncoprotein, there is scarce information on its expression in human tumors. The present study was performed to evaluate its expression in a comprehensive panel of human cancers. Transcript expression was examined in the Oncomine cancer gene microarray database and in a TissueScan Cancer Survey Panel quantitative polymerase chain reaction (Q-PCR) array. Protein expression was assessed by immunohistochemistry (IHC) in cancer tissue microarrays (TMAs) containing 1735 tissues representing single or replicate cores from 1220 individual cases (985 tumor and 235 normal tissues). A total of 21 major cancer types were analyzed. Analysis of LEDGF/p75 transcript expression in Oncomine datasets revealed significant upregulation (tumor vs. normal) in 15 out of 17 tumor types. The TissueScan Cancer Q-PCR array revealed significantly elevated LEDGF/p75 transcript expression in prostate, colon, thyroid, and breast cancers. IHC analysis of TMAs revealed significant increased levels of LEDGF/p75 protein in prostate, colon, thyroid, liver and uterine tumors, relative to corresponding normal tissues. Elevated transcript or protein expression of LEDGF/p75 was observed in several tumor types. These results further establish LEDGF/p75 as a cancer-related protein, and provide a rationale for ongoing studies aimed at understanding the clinical significance of its expression in specific human cancers

    Learning with privileged information via adversarial discriminative modality distillation

    No full text
    Heterogeneous data modalities can provide complementary cues for several tasks, usually leading to more robustalgorithms and better performance. However, while training data can be accurately collected to include a variety of sensory modalities,it is often the case that not all of them are available in real life (testing) scenarios, where a model has to be deployed. This raises thechallenge of how to extract information from multimodal data in the training stage, in a form that can be exploited at test time,considering limitations such as noisy or missing modalities. This paper presents a new approach in this direction for RGB-D visiontasks, developed within the adversarial learning and privileged information frameworks. We consider the practical case of learningrepresentations from depth and RGB videos, while relying only on RGB data at test time. We propose a new approach to train ahallucination network that learns to distill depth information via adversarial learning, resulting in a clean approach without severallosses to balance or hyperparameters. We report state-of-the-art results for object classification on the NYUD dataset, and videoaction recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the Northwestern-UC

    Generative pseudo-label refinement for unsupervised domain adaptation

    No full text
    We investigate and characterize the inherent resilience of conditional Generative Adversarial Networks (cGANs) against noise in their conditioning labels, and exploit this fact in the context of Unsupervised Domain Adaptation (UDA). In UDA, a classifier trained on the labelled source set can be used to infer pseudo-labels on the unlabelled target set. However, this will result in a significant amount of misclassified examples (due to the well-known domain shift issue), which can be interpreted as noise injection in the ground-truth labels for the target set. We show that cGANs are, to some extent, robust against such "shift noise". Indeed, cGANs trained with noisy pseudo-labels, are able to filter such noise and generate cleaner target samples. We exploit this finding in an iterative procedure where a generative model and a classifier are jointly trained: in turn, the generator allows to sample cleaner data from the target distribution, and the classifier allows to associate better labels to target samples, progressively refining target pseudo-labels. Results on common benchmarks show that our method performs better or comparably with the unsupervised domain adaptation state of the art

    Audio-Visual Localization by Synthetic Acoustic Image Generation

    No full text
    Acoustic images constitute an emergent data modality for multimodal scene understanding. Such images have the peculiarity to distinguish the spectral signature of sounds coming from different directions in space, thus providing richer information than the one derived from mono and binaural microphones. However, acoustic images are typically generated by cumbersome microphone arrays, which are not as widespread as ordinary microphones mounted on optical cameras. To exploit this empowered modality while using standard microphones and cameras we propose to leverage the generation of synthetic acoustic images from common audio-video data for the task of audio-visual localization. The generation of synthetic acoustic images is obtained by a novel deep architecture, based on Variational Autoencoder and U-Net models, which is trained to reconstruct the ground truth spatialized audio data collected by a microphone array, from the associated video and its corresponding monaural audio signal. Namely, the model learns how to mimic what an array of microphones can produce in the same conditions. We assess the quality of the generated synthetic acoustic images on the task of unsupervised sound source localization in a qualitative and quantitative manner, while also considering standard generation metrics. Our model is evaluated by considering both multimodal datasets containing acoustic images, used for the training, and unseen datasets containing just monaural audio signals and RGB frames, showing to reach more accurate localization results as compared to the state of the art

    Event based switched dynamic bayesian networks for autonomous cognitive crowd monitoring

    No full text
    Human behavior analysis is one of the most important applications in In-telligent Video Surveillance (IVS) field. In most recent systems addressed by re-search, automatic support to the human decisions based on object detection, track-ing and situation assessment tools is integrated as a part of a complete cognitive artificial process including security maintenance procedures actions that are in the scope of the system. In such cases an IVS needs to represent complex situations that describe alternative possible real time interactions between the dynamic ob-served situation and operators\u2019 actions. To obtain such knowledge, particular types of Event based Dynamic Bayesian Networks E-DBNs are here proposed that can switch among alternative Bayesian filtering and control lower level modules to capture adaptive reactions of human operators. It is shown that after the off line learning phase Switched E-DBNs can be used to represent and anticipate possible operators\u2019 actions within the IVS. In this sense acquired knowledge can be used for either fully autonomous security preserving systems or for training of new op-erators. Results are shown by considering a crowd monitoring application in a critical in-frastructure. A system is presented where a Cognitive Node embedding in a struc-tured way Switched E-DBN knowledge can interact with an active visual simula-tor of crowd situations. It is also shown that outputs from such a simulator can be easily compared with video signals coming from real cameras and processed by typical Bayesian tracking methods

    A bio-inspired knowledge representation method for anomaly detection in cognitive video surveillance systems

    No full text
    Human behaviour analysis has important applications in the field of anomaly management, such as Intelligent Video Surveillance (IVS). As the number of individuals in a scene increases, however, new macroscopic complex behaviours emerge from the underlying interaction network among multiple agents. This phenomenon has lately been investigated by modelling such interaction through Social Forces. In most recent Intelligent Video Surveillance systems, mechanisms to support human decisions are integrated in cognitive artificial processes. These algorithms mainly address the problem of modelling behaviours to allow for inference and prediction over the environment. A bio-inspired structure is here proposed, which is able to encode and synthesize signals, not only for the description of single entities behaviours, but also for modelling cause-effect relationships between user actions and changes in environment configurations (i.e. the crowd). Such models are stored within a memory during a learning phase. Here the system operates an effective knowledge transfer from a human operator towards an automatic systems called Cognitive Surveillance Node (CSN), which is part of a complex cognitive JDL-based and bioinspired architecture. After such a knowledge-transfer phase, learned representations can be used, at different levels, either to support human decisions by detecting anomalous interaction models and thus compensating for human shortcomings, or, in an automatic decision scenario, to identify anomalous patterns and choose the best strategy to preserve stability of the entire system. Results are presented, where crowd behaviour is modelled by means of Social Forces and can interact with a human operator within a visual 3D simulator. The way anomalies are detected and consequently handled is demonstrated on synthetic data and also on a real video sequence, in both the user-support and automatic modes
    corecore